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Japan Patent Office is not responsible for any 
damages caused by the use of this translation. 

1. This document has been translated by computer. So the translation may not reflect the original precisely. 

2. **** shows the word which can not be translated. 
3. In the drawings, any words are not translated. 



DETAILED DESCRIPTION 



[Detailed Description of the Invention] 
[0001] 

[Field of the Invention] This invention relates to the video conference system which performs the television conferen 
equipment used for a television conference, and the communication link between many points, and performs a 
television conference. 
[0002] 

[Description of the Prior Art] The conventional video conference system is constituted by connecting to a personal 
computer (personal computer) or a workstation 703 the television conference equipment constituted by adding the 
peripheral device of the system control station 704 with which the image and the voice codec were built in and a 
camera 701, a monitor 702, a microphone 705, a video board 706, and LAN board 707 grade through communicatio 
networks, such as LAN, as shown in drawing 7 . And it is the description that this video conference system can 
constitute the multimedia communication system compounded with the function of personal computers, such as not 
only the communication link of an image and voice but a multi window, multitasking, data accumulation, an electron 
mail, etc., or a workstation 703. 

[0003] By the way, in this video conference system, there are also many parts which a human factor occupies 
technically, and the design of the human interface in consideration of a user's psychological factor is performed. And 
since a video conference system is usually used for the opinion exchange by two or more participants, things, such a 
presence and a feeling of a stimulus, become an important element. 

[0004] Since eye contact with the partner who is holding a conference in order to sense this presence and a feeling of 
stimulus is needed, with conventional television conference equipment, the camera 701 which photos a user is arrang 
in consideration of the gap of a phase hand's screen and a user's look in the upper part of a monitor 702 in many case 
This is because the above-mentioned look gap is larger than tolerance with the more nearly horizontal tolerance of th 
vertical (perpendicular) direction. 
[0005] 

[Problem(s) to be Solved by the Invention] However, in the case of the television conference equipment using an 
especially large-sized display (monitor), even when constituted as mentioned above, since the tolerance of a look gap 
small, except when a user is conscious and an eye line is invested in a camera 701, as for the face image of the phase 
hand who appears in a monitor 702, the eye line is shifted in many cases. Therefore, even if it performs a television 
conference using the television conference equipment of such a configuration, nature with which people are talking 
ordinarily is missing, and the presence and the feeling of a stimulus a partner is before an eye cannot be obtained. 
[0006] This invention is made in order to solve the above-mentioned problem, and it aims at offering the television 
conference equipment and the video conference system which can perform the television conference excellent in 
natural gender which is talking by actually meeting. 
[0007] 

[Means for Solving the Problem] A display means to display an image in the television conference equipment which 
uses invention of claim 1 for a television conference, A photography means to photo a user, and a display-control 
means to display the image of the other party of a television conference on the partner image display field of the abo 
mentioned display means, It is characterized by having a location detection means to detect the location of the above 
mentioned partner image display field in the above-mentioned display means, and the photography control means to 
which the location of the above-mentioned photography means is changed according to the location of the above- 
mentioned partner image display field detected by the above-mentioned location detection means. 
[0008] A display means to display an image in the television conference equipment which uses invention of claim 2 
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a television conference, A photography means to photo a user, and a display -control means to display the image of th 
other party of a television conference on the above-mentioned display means, It is characterized by having an image 
recognition means to perform the image recognition of the image of the above-mentioned other party, and the 
photography control means to which the location of the above-mentioned photography means is changed according t 
the display position of the predetermined part of the image of the above-mentioned other party recognized by the 
above-mentioned image recognition means. 

[0009] Invention of a claim 3 carries out having a display means display an image, the photography means photo a 
user, the user location detection means detect the location of the user who performed an utterance, and the photograp 
control means, to which the location or the include angle of the above-mentioned photography means changes 
according to the location of the user who performed the utterance detected by the above-mentioned user location 
detection means in the television conference equipment which uses for a television conference as the description. 
[0010] In the video conference system which invention of claim 4 performs the communication link between many 
points, and performs a television conference A display means to display an image, a photography means to photo the 
image by the side of oneself, and a display-control means to display the image of the other party of a television 
conference on the partner image display field of the above-mentioned display means, It is characterized by having a 
location detection means to detect the location of the above-mentioned partner image display field in the above- 
mentioned display means, and the photography control means to which the location of the above-mentioned 
photography means is changed according to the location of the above-mentioned partner image display field detected 
by the above-mentioned location detection means. 
[0011] 

[Function] According to the television conference equipment of claim 1, and the video conference system of claim 4 
By the photography control means, for example, by making it move a photography means to the same location as the 
location of the partner image display field in a display means Since a photography means will be automatically seen 
when an eye is invested in a partner image display field, in order that a user may talk with a partner, the television 
conference excellent in that natural gender [ like ] which is having a dialog by doubling an eye line with a partner ca 
be performed. 

[0012] By making it move a photography means to the location of a partner's image, for example, a face, recognized 
the image recognition means, or an eye by the photography control means according to the television conference 
equipment of claim 2 Since a photography means will be automatically seen when especially a large-sized display is 
used, and a partner's image is seen, in order that a user may talk with a partner, the television conference excellent in 
that natural gender [ like ] which is having a dialog by doubling an eye line with a partner can be performed without 
producing a look gap with a partner. 

[0013] Since according to the television conference equipment of claim 3 a user location detection means can detect 
speaker's location and the location or include angle of a photography means can be changed by the photography cont 
means according to the location Even when there are two or more users by the side of a self-device, the television 
conference excellent in that natural gender [ like ] which can turn a photography means to speakers, can be made to 
move a photography means to a speaker's transverse plane, therefore is talking by meeting a partner can be performe 
[0014] 

[Embodiment of the Invention] Hereafter, the example of this invention is explained based on a drawing. Drawing 8 
the block diagram showing the element-description of this invention. In this drawing, the communications departmen 
transmits an image, voice, data, etc. and receives between the television conference equipment of the other party 
through a communication network. And the image received in the communications department 5 is displayed on a 
display 3 by the display and control section 4. 

[001 5] A user's (oneself side) image photoed on the other hand with the camera 1 which constitutes a photography 
means is transmitted to the television conference equipment of the other party through the communications departme 
5 and a communication network. 

[0016] While various information is displayed, it is constituted by the display 3 so that it may be displayed on the 
partner image display field 10 to which a meeting partner's image was prepared in a part of display 3. And it can be 
made to move to the location of arbitration on a display 3, and this partner image display field 10 is made as [ chang 
the magnitude of that viewing area ]. 

[0017] The location detecting element 6 detects the location of the partner image display field 10 on a display 3, and 
sends out the positional information to the camera control section 2. 

[0018] The image recognition section 7 recognizes partner's person's image displayed on the partner image display fi 
10, and sends out the positional information which shows the location of the person's face, or an eye to the camera 
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control section 2. 

[0019] The user location detecting element 9 detects a speaker's location from the location of the microphone 8 into 
which a speaker's voice was inputted, and sends out the positional information to the camera control section 2. 
[0020] The camera control section 2 is constituted so that the location of a camera 1 can be moved to 3 shaft 
orientations of XYZ, and it moves a camera 1 so that the location of a camera 1 may come to the location of the poin 
user's eye line according to the positional information acquired from the location detecting element 6, the image 
recognition section 7, and user location detecting-element 9 grade. 

[0021] In addition, the camera control section 2 is constituted so that the include angle of a camera 1 can also be free 
changed according to the positional information acquired from the location detecting element 6, the image recognitio 
section 7, and user location detecting-element 9 grade. 

[0022] In addition, the above-mentioned display and control section 4, the location detecting element 6, the image 
recognition section 7, and the user location detecting element 9 are realized by the system control stations 109 and 4 
mentioned later. 

[0023] The [1st example] Drawing 1 is drawing showing the configuration of the television conference equipment of 
the 1st example of this invention. In this drawing, 109 is a system control station which controls this whole television 
conference equipment. The voice detected with the microphone which is not illustrated and the image photoed with a 
camera 104 are processed by the image and voice codec inside a system control station 109, and is inputted into the 
computer 108 which consists of a personal computer or a workstation through a video board etc. And by connecting 
this computer 108 and a system control station 109 with other television conference equipments through means of 
communications, such as LAN or ISDN, a video conference system is built and the communication link of the image 
many points, voice, information, etc. is attained. 

[0024] 101 is a digitizer which consists of transparent members, it is constituted so that the directions implement of a 
pen mold can perform a coordinate input, and it is controlled by the digitizer control unit 107. 
[0025] 102 is a transparency mold liquid crystal display, and various kinds of data are displayed on this liquid crysta 
display 102 as a monitor of a terminal unit. 

[0026] 103 is a half mirror and the image processed by the image codec prepared in the system control station 109 
connected by means of communications, such as avoiding reflected [ to a user / a camera 104 ], simultaneously LAN 
projected by the projection arrangement 106. This image projected is mainly a portrait image of the partner of a 
television conference, and a user can recognize the image of a through lever for a liquid crystal display 102 and a 
digitizer 101 as an image. 

[0027] The display position etc. is controlled by the system control station 109 so that the image of the various 
information displayed on a liquid crystal display 102 by control of a computer 108 and the image projected with a 
projection arrangement 106 lap and is not recognized by the user. 

[0028] The migration control is made according to the display position of the field of an image where control of the 
camera station, photography conditions, etc. is made by the camera control unit 105, and a projection arrangement 10 
projects a camera 104 to a half mirror 103, i.e., a portrait image display window area, (partner image display field). 
Migration of a camera 104 is performed by the positioning migration equipment of the three dimension by which a 
common name is carried out to a XYZ stage etc., and the position control is made by the coordinate value with whic 
the above-mentioned portrait image display window is located. 

[0029] Drawing 2 shows signs that the portrait image display window area which a projection arrangement 106 proje 
is interlocked with a camera 104, and it operates. In addition, magnitude of a portrait image display window area is 
considered as immobilization here. 

[0030] If a user moves the portrait image display window area in Screen 201 which consists of a liquid crystal displa 
102 and a half mirror 103 to the location of 203 from the location of 202 by actuation of a digitizer 101, a camera 10 
will be interlocked with migration of this portrait image display window area, and will be moved to the location of 2 
to 206 by control of the camera control unit 105. 

[0031] Therefore, the camera 104 will be seen when a user invests an eye in a portrait image display window area. 
[0032] Drawing 3 shows signs that the position control of a camera 104 is made to the portrait image display window 
area which can be expanded or reduced. 

[0033] The above-mentioned portrait image display window area is constituted so that it can expand and reduce to 
arbitration by actuation of a user's pen or a mouse, but while migration of the display position of a portrait image 
display window area is made, when expansion of the area size and contraction are made, the following migration 
control of the location of a camera 104 is made by the camera control unit 105. 

[0034] First, if a portrait image display window area is the magnitude which is about 104 camera, a camera 104 will 
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moved to the center position of a portrait image display window area. 

[0035] Moreover, when a portrait image display window area is bigger than a camera 104, the image recognition 
section prepared in the system control station 109 performs the image recognition of the image of the person in a 
portrait image display window area, and a camera 104 is moved to the location of the face or an eye. That is, if a 
portrait image display window area is expanded while a user moves the portrait image display window area in Screen 
301 to the location of 303 from the location of 302 as shown in drawing 3 , a camera 104 will be moved to the locati 
of 306 by the camera control unit 105 from the location of 305. This location of 306 is a location of partners person' 
face specified according to image recognition as mentioned above, or an eye. Therefore, when a motion of a user's ey 
line becomes large and a partner's portrait image is seen like [ in the case of using the large-sized display (screen) ] 5 i 
coming to see the camera 104. Thus, since what is necessary is just to carry out by constituting only to the image on 
which image recognition is projected by the projection arrangement 106, a load is not applied to the processing secti 
of the image displayed on a liquid crystal display 102. And since a user will look at a camera 104 automatically whe 
he looks at a phase hand's image, he can offer the video conference system excellent in natural gender which talks by 
actually meeting. 

[0036] The [2nd example] The display (portrait image display window area) which displays a phase hand's image, an 
the display (liquid crystal display 102) which displays data etc. consisted of the 1st above-mentioned example 
separately. This example explains a phase hand's image, data, etc. per example in the case of making it display by th 
same display processing. In addition, the same sign is given to the same part as the example of drawing 1 , and the 
explanation is omitted. 

[0037] In drawing 4 , 402 is the transparency mold liquid crystal display with which the display is controlled by the 
liquid crystal control unit 406, and displays various kinds of data by which batch processing is carried out with a 
system control station 409, and a phase hand's image. 403 is a member for preventing reflected [ the camera 104 to a 
user ]. 

[0038] The location of a camera 104 is controlled by the camera control unit 105 according to the contents of the ima 
displayed on the liquid crystal display 402. That is, in a system control station 409, the window (partner image displa 
field) of a phase hand's image is recognized from the image displayed on the liquid crystal display 402, and the imag 
recognition of a partner's face or eye of a person is further carried out from the image in the window area. And the 
positional information of the recognized face or an eye is inputted into the camera control unit 105, and the camera 
control unit 105 moves the location of a camera 104 to the location of a meeting partner's face displayed on the wind 
of a phase hand's image in the location of a camera 104 based on the positional information, or an eye. 
[0039] Thus, in this example, a phase is stepped on, the image recognition of the whole display screen (liquid crystal 
display 402) is performed, and by performing migration control of a camera 104 based on that activation result, since 
user will look at a camera 104 automatically when he looks at a phase hand's image on the screen of a display 402, h 
can offer the video conference system excellent in natural gender which talks by actually meeting. 
[0040] The [3rd example] Next, based on drawing 5 and drawing 6 , it explains per example in case the partner of tw 
or more points and two or more users perform a television conference. 

[0041] in this case - as shown in drawing 6 , while arranging two or more camera control units 105 and cameras 104 
corresponding to each portrait image display window area where the image of the partner of two or more points is 
displayed - the microphones 1, 2, and 3 for the number of two or more users by the side of a self-device .. is arrange 
corresponding to each user. In addition, in drawing 6 , that it is a "display side" expresses the Screen 501 side of 
drawing 5 , and that it is a "user side" expresses the location where the user shown by 5 10 of drawing 5 is sitting dow 
[0042] In this configuration, if a certain user speaks, voice will be detected by the microphone made to correspond b 
that user that spoke, and that detection information will be sent to a system control station 109 (409). A system contr 
station 109 (409) pinpoints the speaker's location based on the positional information of each microphone registered 
beforehand, and sends it out to the camera control unit 105 with the face of the person of a portrait image display 
window area who has recognized the positional information according to image recognition, or the positional 
information of an eye. 

[0043] The camera control unit 105 changes the include angle of a camera 104, and is controlled to turn a camera 10 
in the direction of a speaker while moving each camera 104 to the location of the face of the person of a portrait ima 
display window area, or an eye using those positional information. In addition, the location of a camera 104 is 
immobilization and you may make it change only the include angle of a camera 104 only using a speaker's positiona 
information in this case. 

[0044] Drawing 5 is drawing for explaining the configuration of drawing 6 concretely. As shown in drawing 5 , whil 
the windows 502 and 503 which display data, such as a graph, are displayed on Screen 501, the portrait image of eac 
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meeting participant in A point which is the partner of a television conference, B point, and C point is displayed on th 
portrait image display window areas 504, 505, and 506. And the camera 104 is arranged on the background of these 
window areas 504, 505, and 506, respectively. 

[0045] On the other hand, the user by the side of a self-device shall be the trinominal of A-C, this trinominal shall sit 
down by arrangement as shown in 510 of drawing 5 , and microphone A-C shall be arranged in front of the user of th 
A-C, respectively. 

[0046] Now, supposing User C speaks, Microphone C will detect [ User C ] the voice to generate. Based on this 
detection, a system control station 109 (409) pinpoints that speaker's C location based^n^B^s^ o 
each microphone registered beforehand, and sends it out to the camera control unit 105 with the face of the person of 
portrait image display window area who has recognized that positional information according to image recognition, o 
the positional information of an eye. The camera control unit 105 changes the include angle of a camera 104, and is 
controlled to turn each camera 104 in the direction of Speaker C, as arrow heads 507, 508, and 509 show while movi 
each camera 104 to the location of the face of the person of a portrait image display window area, or an eye using tho 
positional information. 

[0047] Thus, since a camera 104 is turned to the user who spoke when two or more users are in a self-device side, th 

video conference system excellent in natural gender which talks by actually meeting can be offered. 

[0048] In addition, although phase hands were three points of A, B, and C in the example of drawing 5 This may be 

any point and does not form a camera 104 corresponding to each portrait image display window area, respectively. Y 

may constitute so that it may be made to move to the location of the transverse plane of the user who spoke or this 

camera 104 may be turned in the direction of the user who spoke only using one camera 104. 

[0049] Although explained per [ which constitutes the television conference equipment of this invention combining 

camera 104 grade to a personal computer or a workstation ] example, the special-purpose machine which used CPU 

instead of the personal computer or the workstation may constitute the television conference equipment of this 

invention from an above-mentioned example. 

[0050] Moreover, in the 1st and 2nd above-mentioned examples, the include angle of a camera 104 may also be 
automatically changed by the camera control section 105 after migration of a camera 104, and you may constitute so 
that a user's eye line may carry out a right pair to the lens of a camera 104. By doing in this way, a gap of the look of 
user and a partner is completely lost and becomes possible [ performing a much more natural dialogue ]. 
[0051] 

[Effect of the Invention] As explained above, according to the television conference equipment and the video 
conference system of this invention, the television conference excellent in natural gender which is talking by actually 
meeting the partner of a meeting can be performed. 

[Translation done.] 
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